Pre-processing Large Resources for Family Names Research
نویسنده
چکیده
This paper describes methodology and tools used to preprocess historical archive documents in various formats and their conversion to unified format. Resources were used to investigate the origins and geographical distribution of surnames in the United Kingdom, as part of the Family Names in Britain and Ireland research project. Data extracted from the documents and their connection proved to be valuable research resource which helped to speed up the lexicographic work.
منابع مشابه
Protein Name Tagging for Biomedical Annotation in Text
We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from bio...
متن کاملThrone Name in the Achaemenid period
The Achaemenid kings after Darius I elected Darius, Xerxes, and Artaxerxes as their throne name, when they were nominating or substituting for succession. Each of these kings has chosen one of these names according to what happen for they before they reached the king's throne, how to achieve the throne and based on their design and program. These names are not personal and real names, but they ...
متن کاملThe Statistical Analysis of Family Names of Donators For WenChuan
It is analyzed that the family names of personal donators who donated through China Construction Bank Corporation to Chinese Red Cross Foundation for the WenChuan earthquake. The distribution of family names, the first 100 family names and their shares, the probability of the same family names as well as the Gini coefficient are all given in this paper. A heavy disproportion is showed in the di...
متن کاملGenenames.org: the HGNC and VGNC resources in 2017
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as ho...
متن کاملOnline Processing Redux
The term \online" has become an all-too-common addendum to database system names of the day. In this article we reexamine the notion of processing queries online. We distinguish between online processing and preprocessing, and argue that online processing for large queries requires redesigning major portions of a database system. We highlight pressing applications for truly online processing, a...
متن کامل